24TopicsInStatML

Chelsea Parlett-Pelleriti

I Fit a Bayesian Model, Now What?

Bayesian Review

💡 in some very simple settings, we can use conjugate priors that make updating Prior ➡️ Posterior easy (e.g. the beta-binomial example we did by hand in class)

⚠️ in most other cases, we need to use MCMC to approximate our posterior distribution, and end up with draws from the posterior

Overview

We’ve spent a lot of time on who to interpret statistical tests and parameter estimates from Bayesian models, and even some time on how to fit these models. But what do we do with the outputs of the models?

  • Manipulating Posterior Draws

  • Marginal Effects

Manipulating Posterior Draws

With modern Bayesian methods, we usually use some kind of sampling algorithm like MCMC to get posterior draws from our posterior distribution.

  • draw: one individual sample from the posterior distribution. Contains one value per parameter in \(\theta\)

  • chain: the markov chain used to generate samples from \(\theta\), each iteration in the chain produces one draw

Manipulating Posterior Draws

If we’ve reached convergence, the draws will be draws from \(p(\theta \mid x)\) (our posterior) and we can use them to approximate the posterior distribution itself.

Manipulating Posterior Draws

While software like rstan and brms will often provide default summaries (mean, 95% Credible Interval, etc) of the posterior draws, it can be useful to know how to manipulate them yourself to calculate custom quantities.

  • approximate the probability of \(\theta\) being in a custom range

  • diagnose chain-specific issues

  • use a loss function \(f(x)\) to approximate the risk/reward of your estimates being wrong

Manipulating Posterior Draws

Use the data here to analyze the relationship between various personal/demographic variables and schools standardized test scores. Fit a Bayesian Mixed Effect GLM appropriate for the question and use the output and ggplots to answer the questions below.

Manipulating Posterior Draws

💡Remember, each draw from the posterior is a draw from the multivariate posterior distribution, you’ll get one sample of each parameter in your model.

🧠 think of each draw as a universe in the multi-verse: it’s one reasonable way the world could look based on your posterior*

*also keep this in mind: this is why we do calculations on the draw level first then aggregate. e.g. what’s the posterior of BMI?

Manipulating Posterior Draws

library(brms)
library(tidyverse)
library(bayesplot)
library(tidybayes)

data <- read.csv("/Users/chelseaparlett/Desktop/CPSC540ParlettPellerti/Data/23cw2.csv")

# fit model
mm <- brm(standardized_test_score ~ hair_color + sleep_hours + grade_level + caloric_intake_per_day + (1 | school_district),
          data = data)
# print out variable names
get_variables(mm)

# get raw draws
draws <- mm %>% spread_draws(b_sleep_hours)

# custom calc
mean(draws$b_sleep_hours > 5)

Extracting Posterior Draws

the spread_draws() function from tidybayes extracts the raw draws from a Bayesian model and puts them in a pretty format.

draws <- mm %>% spread_draws(b_sleep_hours)
glimpse(draws)
Rows: 4,000
Columns: 4
$ .chain        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ .iteration    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ .draw         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ b_sleep_hours <dbl> 5.576268, 4.780464, 5.048142, 5.537859, 5.361288, 5.0224…

Extracting Posterior Draws

we can also extract multiple parameters (for example, all the parameters for the Random Intercepts of District).

# print out variable names
get_variables(mm)
 [1] "b_Intercept"                            
 [2] "b_hair_colorBlonde"                     
 [3] "b_hair_colorBrown"                      
 [4] "b_hair_colorOther"                      
 [5] "b_sleep_hours"                          
 [6] "b_grade_level"                          
 [7] "b_caloric_intake_per_day"               
 [8] "sd_school_district__Intercept"          
 [9] "sigma"                                  
[10] "Intercept"                              
[11] "r_school_district[District.A,Intercept]"
[12] "r_school_district[District.B,Intercept]"
[13] "r_school_district[District.C,Intercept]"
[14] "r_school_district[District.D,Intercept]"
[15] "r_school_district[District.E,Intercept]"
[16] "lprior"                                 
[17] "lp__"                                   
[18] "accept_stat__"                          
[19] "stepsize__"                             
[20] "treedepth__"                            
[21] "n_leapfrog__"                           
[22] "divergent__"                            
[23] "energy__"                               
# get Random Intercept Draws
draws <- mm %>% spread_draws(r_school_district[District,])
glimpse(draws)
Rows: 20,000
Columns: 5
Groups: District [5]
$ District          <chr> "District.A", "District.A", "District.A", "District.…
$ r_school_district <dbl> -1.4819239, 2.0013556, 1.5716733, 6.7026422, 1.22519…
$ .chain            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ .iteration        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ .draw             <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…

Extracting Posterior Draws

we can also grab multiple parameters of different types.

# get Random Intercept Draws
draws <- mm %>% spread_draws(`b_Intercept`, r_school_district[District,])
glimpse(draws)
Rows: 20,000
Columns: 6
Groups: District [5]
$ .chain            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ .iteration        <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4…
$ .draw             <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4…
$ b_Intercept       <dbl> 50.68877, 50.68877, 50.68877, 50.68877, 50.68877, 54…
$ District          <chr> "District.A", "District.B", "District.C", "District.…
$ r_school_district <dbl> -1.4819239, -4.4335159, 1.5671042, 6.4785248, -9.173…

Extracting Posterior Draws

we can also grab multiple parameters of different types and manipulate them to get quantities we’re interested in.

# get Random Intercept Draws
draws <- mm %>% spread_draws(`b_Intercept`, r_school_district[District,]) %>%
  mutate(District_mean = b_Intercept +  r_school_district)
head(draws)
# A tibble: 6 × 7
# Groups:   District [5]
  .chain .iteration .draw b_Intercept District   r_school_district District_mean
   <int>      <int> <int>       <dbl> <chr>                  <dbl>         <dbl>
1      1          1     1        50.7 District.A             -1.48          49.2
2      1          1     1        50.7 District.B             -4.43          46.3
3      1          1     1        50.7 District.C              1.57          52.3
4      1          1     1        50.7 District.D              6.48          57.2
5      1          1     1        50.7 District.E             -9.17          41.5
6      1          2     2        54.5 District.A              2.00          56.5

Extracting Posterior Draws

these are often values that are more interpretable and reportable to a general audience. For example, instead of reporting a posterior for Intercept, and a posterior for District Offsets, we report District Mean posteriors

# get Random Intercept Draws
draws <- mm %>% spread_draws(`b_Intercept`, r_school_district[District,]) %>%
  mutate(District_mean = b_Intercept +  r_school_district)

ggplot(draws, aes(y = District, x = District_mean)) + stat_halfeye()

Extracting Posterior Draws

This also allows us to custom add references to the plots, like a 95% HDI for the Overall Intercept.

# get Random Intercept Draws
draws <- mm %>% spread_draws(`b_Intercept`, r_school_district[District,]) %>%
  mutate(District_mean = b_Intercept +  r_school_district)

mean_overall <-  median_qi(draws %>% ungroup(),
                           intercept_mean = b_Intercept,
                           .width = c(.95, .50))

ggplot(draws, aes(y = District, x = District_mean)) + stat_halfeye() + 
  geom_vline(xintercept = mean_overall[[1, "intercept_mean"]], color = "red") + 
  annotate(geom = "rect",
           xmin = mean_overall[[1, ".lower"]],
           xmax = mean_overall[[1, ".upper"]],
           ymin = 0,
           ymax = 6,
           fill = "red", alpha = 0.2)

Summarizing Posterior Draws

Again, we can use these draws to get values and summaries that are easier for most people to understand.

# get Random Intercept Draws
draws <- mm %>% spread_draws(`b_Intercept`, r_school_district[District,]) %>%
  mutate(District_mean = b_Intercept +  r_school_district)

draws |> median_qi(District_mean = b_Intercept +  r_school_district, .width = c(0.89,0.5))
# A tibble: 10 × 7
   District   District_mean .lower .upper .width .point .interval
   <chr>              <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 District.A          52.4   47.9   57.0   0.89 median qi       
 2 District.B          48.4   43.9   52.9   0.89 median qi       
 3 District.C          55.0   50.5   59.5   0.89 median qi       
 4 District.D          60.3   55.7   65.0   0.89 median qi       
 5 District.E          45.0   40.4   49.5   0.89 median qi       
 6 District.A          52.4   50.5   54.4   0.5  median qi       
 7 District.B          48.4   46.5   50.3   0.5  median qi       
 8 District.C          55.0   53.0   56.8   0.5  median qi       
 9 District.D          60.3   58.3   62.3   0.5  median qi       
10 District.E          45.0   43.0   46.8   0.5  median qi       

Visualizing Posterior Draws

We want to know whether the posteriors of mean scores for different hair colors are different, and if any of them provide evidence that scores are “average (between 45-55).

draws <- mm %>% spread_draws(`b_Intercept`,b_hair_colorBlonde, b_hair_colorBrown, b_hair_colorOther) |>
  mutate(Black = b_Intercept,
          Brown = b_Intercept + b_hair_colorBrown,
          Blonde = b_Intercept + b_hair_colorBlonde,
          Other = b_Intercept + b_hair_colorOther) |>
  select(-c(b_Intercept,b_hair_colorBlonde, b_hair_colorBrown, b_hair_colorOther)) |>
  pivot_longer(cols = c("Black", "Brown", "Blonde", "Other"))
draws |> head(3)
# A tibble: 3 × 5
  .chain .iteration .draw name   value
   <int>      <int> <int> <chr>  <dbl>
1      1          1     1 Black   50.7
2      1          1     1 Brown   51.3
3      1          1     1 Blonde  50.5

Visualizing Posterior Draws

We want to know whether the posteriors of mean scores for different hair colors are different, and if any of them provide evidence that scores are “average (between 45-55).

ggplot(draws, aes(y = name, x = value, fill = after_stat(abs(x-50) < 5))) + stat_halfeye() +
  geom_vline(xintercept = c(45, 55), linetype = "dashed") +
  scale_fill_manual(values = c("gray80", "skyblue")) + theme(legend.position = "none")

Posterior Predictions

(s/o to Andrew Heiss and his wonderful blog posts on these topics, highly recommend 1, 2)

Setting up the Model

🐧 ::: {style=“font-size:40%;”}

library(brms)
library(tidybayes)        
library(marginaleffects)  
library(palmerpenguins)  # Penguin Data
penguins <- na.omit(penguins)
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = sex)) + geom_point(size = 5, alpha = 0.2) + 
  theme_minimal() + labs(x = "Bill Length (mm)",
                         y = "Bill Depth (mm)",
                         title = "Penguin Bill Measurement by Sex") + 
  theme(panel.grid.minor = element_blank(),
        panel.grid.major = element_line(linewidth = 0.4, linetype = "dashed"),
        legend.position = "bottom")

:::

Setting Up the Model

log_model <- brm(
  sex ~ bill_length_mm + bill_depth_mm ,
  family = bernoulli(),
  data = penguins
)
summary(log_model)
 Family: bernoulli 
  Links: mu = logit 
Formula: sex ~ bill_length_mm + bill_depth_mm 
   Data: penguins (Number of observations: 333) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept        -25.33      2.95   -31.16   -19.78 1.00     2111     1995
bill_length_mm     0.28      0.04     0.21     0.35 1.00     2323     2411
bill_depth_mm      0.78      0.10     0.60     0.98 1.00     2377     2333

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Working with Posterior Distributions

\[ \mathbb{E}\left (y | \mathbf{X}\right ) = \mu = g^{-1}\left (\mathbf{X}\beta \right ) \\ p\left (y|x\right ) \sim \pi\left (\mu,...\right ) \]

  • Linear Predictions: \(\mathbf{X}\beta\)

  • Expected Value Predictions: \(\mathbb{E}\left (y | \mathbf{X}\right ) = g^{-1}\left (\mathbf{X}\beta \right )\)

  • Posterior Predictions: \(p\left (\tilde{y}|y \right)\)

Linear Predictions

\[ \mathbb{E}\left (y | \mathbf{X}\right ) = \mu = g^{-1}\left (\mathbf{X}\beta \right ) \\ p\left (y|x\right ) \sim \pi\left (\mu,...\right ) \]

Linear Predictions are draws from the linear prediction \(\mathbf{X}\beta\) that are linear combinations of the columns of our design matrix before transforming them with the link function.

In logistic regression, this would be predictions on the log odds scale (“what are the predicted log odds of being registered to vote?”)

Linear Predictions

❓ we get out a \(4000 \times 333\) dimensional matrix as output. Where do those two numbers come from???

dim(penguins)
[1] 333   8
lin_preds <- posterior_linpred(log_model)
# lin_preds_long <-  linpred_draws(log_model, newdata = penguins) # get data in long format
dim(lin_preds)
[1] 4000  333

Expected Value Predictions

\[ \mathbb{E}\left (y | \mathbf{X}\right ) = \mu = g^{-1}\left (\mathbf{X}\beta \right ) \\ p\left (y|x\right ) \sim \pi\left (\mu,...\right ) \]

Expected Value Predictions are draws of the expected values \(\mu\) that are linear predictions that have been transformed using the link function.

In logistic regression, this would be predicted probabilities (“what is the predicted probability of being registered to vote?”)

Expected Value Predictions

e_preds <- posterior_epred(log_model)
# e_preds_long <-  epred_draws(log_model, newdata = penguins) # get data in long format
dim(e_preds)
[1] 4000  333

Expected Value Predictions

ggplot(e_preds_long, aes(x = .epred, fill = species)) + stat_halfeye() + labs(x = "Predicted Probability", y = "") + theme_minimal() + 
  facet_wrap(~species)

Expected Value Predictions

ggplot(e_preds_long, aes(x = .epred, fill = species)) + stat_halfeye() + labs(x = "Predicted Probability", y = "") + theme_minimal() + 
  facet_grid(sex~species)

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Predictive Distributions

Distributional Variability

How certain are we about the value of the parameters?

Predictive Distributions

Sampling Variability

How much will observed data typically vary from its expected value? a.k.a. inherent variability in the data around the true value.

Predictive Distributions

\[ p(\tilde{x} \mid x) \]

Probability of new sample \(\tilde{x}\) given the current data \(x\).

  1. draw \(\theta_i \sim \pi(\theta)\) (one sample of all the model parameters)

  2. draw \(y_i \sim p(y_i \mid \theta_i)\) (sample of data from model implied by \(\theta_i\))

1 accounts for uncertainty about the parameters \(\theta\), 2 accounts for uncertainty in the sample

Posterior Predictions

\[ \mathbb{E}\left (y | \mathbf{X}\right ) = \mu = g^{-1}\left (\mathbf{X}\beta \right ) \\ p\left (y|x\right ) \sim \pi\left (\mu,...\right ) \]

Posterior Predictions are draws of data \(y_{rep}\) that are essentially new samples of data that are based on the posterior distribution from the model \(p(\tilde{y} \mid y)\).

In logistic regression, this would be new samples of binary (0/1) data that are based on the parameters learned by the model.

Posterior Predictions

p_preds <- posterior_predict(log_model)
# p_preds_long <-  predicted_draws(log_model, newdata = penguins) # get data in long format
dim(p_preds)
[1] 4000  333

Working With Posterior Distributions

Andrew Heiss made this great cheat sheet (though his notation is a little different than ours).

from Andrew Heiss’s Blog

Further Reading

Marginal Effects

Now that you know about how to work with posterior draws and predictions, you’re ready to know about Marginal Effects!

Vincent Arel-Bundock’s Marginal Effects Zoo

What are MEs?

Unfortunately, there’s not enough time to go into depth with Marginal Effects, but I want you to be aware of what they are, because they’ll put you head and shoulders above your peers when it comes to communicating your results with non-experts. Vincent Arel-Bundock’s book Model to Meaning, and Andrew Heiss’s blog Marginalia are great resources to jump in here. I’ll be pulling from both a lot here!

What are MEs?

Marginal Effects refer to two opposite things:

What are MEs?

\(y = 2x - 1\)

What are MEs?

\(y = -0.5x^2 + 5x\)

❓is the “marginal” slope (derivative) constant?

What are MEs?

What are MEs?

Shout out to Andrew Heiss’ blog again for allll the plotting code used in the examples in this subsection.

Why MEs?

Marginal Effects can help you explain/plot effects that are otherwise difficult to communicate.

Telling someone “The coefficient for \(x^2\) is \(5\)” is not something that will actually help them understand what the effect really means. Similarly, GLMs that use some confusing link functions can make it hard for non-experts (and experts) to really grok what effects indicate. Marginal Effects can help!

We covered about 0.001% of what Marginal Effects are, highly recommend the book Models to Meaning by Vincent Arel-Bundock. He also wrote the marginaleffects packages in R and python which work with both Frequentist and Bayesian models!